126 research outputs found
Hard to Cheat: A Turing Test based on Answering Questions about Images
Progress in language and image understanding by machines has sparkled the
interest of the research community in more open-ended, holistic tasks, and
refueled an old AI dream of building intelligent machines. We discuss a few
prominent challenges that characterize such holistic tasks and argue for
"question answering about images" as a particular appealing instance of such a
holistic task. In particular, we point out that it is a version of a Turing
Test that is likely to be more robust to over-interpretations and contrast it
with tasks like grounding and generation of descriptions. Finally, we discuss
tools to measure progress in this field.Comment: Presented in AAAI-15 Workshop: Beyond the Turing Tes
Spatio-Temporal Image Boundary Extrapolation
Boundary prediction in images as well as video has been a very active topic
of research and organizing visual information into boundaries and segments is
believed to be a corner stone of visual perception. While prior work has
focused on predicting boundaries for observed frames, our work aims at
predicting boundaries of future unobserved frames. This requires our model to
learn about the fate of boundaries and extrapolate motion patterns. We
experiment on established real-world video segmentation dataset, which provides
a testbed for this new task. We show for the first time spatio-temporal
boundary extrapolation in this challenging scenario. Furthermore, we show
long-term prediction of boundaries in situations where the motion is governed
by the laws of physics. We successfully predict boundaries in a billiard
scenario without any assumptions of a strong parametric model or any object
notion. We argue that our model has with minimalistic model assumptions derived
a notion of 'intuitive physics' that can be applied to novel scenes
Ask Your Neurons: A Neural-based Approach to Answering Questions about Images
We address a question answering task on real-world images that is set up as a
Visual Turing Test. By combining latest advances in image representation and
natural language processing, we propose Neural-Image-QA, an end-to-end
formulation to this problem for which all parts are trained jointly. In
contrast to previous efforts, we are facing a multi-modal problem where the
language output (answer) is conditioned on visual and natural language input
(image and question). Our approach Neural-Image-QA doubles the performance of
the previous best approach on this problem. We provide additional insights into
the problem by analyzing how much information is contained only in the language
part for which we provide a new human baseline. To study human consensus, which
is related to the ambiguities inherent in this challenging task, we propose two
novel metrics and collect additional answers which extends the original DAQUAR
dataset to DAQUAR-Consensus.Comment: ICCV'15 (Oral
Comment le roman peut-il étre «wagnérien»? Le cas d'Élémir Bourges
The centre of attention of the paper is the connections between the novel Le Crépuscule des dieux (1884) by Bourges and Wagnerian cycle of the Ring: their presence is traced in the subject matter of the work, at the level of character construction and composition. The opinion of the author is that Bourges is the most complete example, illustrating the influence of Wagnerian esthetics in the field of French novel
„Théodore de Banville: artysta czy rzemieślnik?”
This paper puts forward a reflection upon the status of Theodore de Banville in the literary Pantheon by making an attempt to determine the proportions of art and artisanry in his poetic work. While Banville deserves to be referred to as an artist, providing the term is based on the 19th century definition, the name of an artisan, so often used with reference to his person by critics and literary historians might also be justified. Looking at Banville’s work more closely, analyzing the role of his creative inspiration and patiently built text structure, observing the importance attached by the author to the versification techniques, his love for technical difficulty, his meticulousness and his writing skills, but also the frequency with which he creates images and metaphors referring to what was previously called ‘mechanical arts’, long story short, taking into account everything that falls under the category of poetic techniques, the answer to the question posed in the title of the paper, whether Banville was an artist or an artisan, is that he was both.Artykuł proponuje refleksję nad statusem Théodora de Banville w literackim Panteonie i charakterystykę jego twórczości poetyckiej poprzez próbę ustalenia proporcji zachodzących w niej między sztuką a rzemiosłem. O ile bowiem miano artysty przysługuje niewątpliwie poecie na mocy dziewiętnastowiecznych definicji tego pojęcia, to miano rzemieślnika, tak często nadawane mu przez krytyków i historyków literatury, także ma swoje uzasadnienie. Przyglądając się dziełu Banville’a uważniej, badając w nim udział twórczej inspiracji i cierpliwej konstrukcji tekstu, obserwując wagę przywiązywaną przez autora do zagadnień techniki wierszopisarskiej, jego umiłowanie trudności, jego skrupulatność, jego zręczność wykonawcy, ale także częstotliwość, z jaką pojawiają się pod jego piórem obrazy i metafory odsyłające do tego, co dawniej nazywano „sztukami mechanicznymi”, krótko mówiąc, uwzględniając wszystko to, co składa się na warsztat poety, na tytułowe pytanie, czy Banville jest artystą, czy rzemieślnikiem?, odpowiedź brzmi: zarówno jednym, jak i drugim
«L'Idéal sous les voiles de l’ électricité». A propos de «L'Ève future» de Villiers de l ’Isle-Adam
The article discusses an extremely idealistic message of the 1886 novel L ’Ève future by Villiers de l’lsle-Adam. The motif of an artificial woman constructed by a genius scientist is a pretext for philosophical and metaphysical meditation on human fate, an attempt to reach - through art - the deepest mysteries of the spiritual world set against shallow visible reality. The subject matter and rhetorics of the work is an expressive illustration of symbolism in the field of French novel
Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Question categorization and expert retrieval methods have been crucial for
information organization and accessibility in community question & answering
(CQA) platforms. Research in this area, however, has dealt with only the text
modality. With the increasing multimodal nature of web content, we focus on
extending these methods for CQA questions accompanied by images. Specifically,
we leverage the success of representation learning for text and images in the
visual question answering (VQA) domain, and adapt the underlying concept and
architecture for automated category classification and expert retrieval on
image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of
Yahoo! Answers.
To the best of our knowledge, this is the first work to tackle the
multimodality challenge in CQA, and to adapt VQA models for tasks on a more
ecologically valid source of visual questions. Our analysis of the differences
between visual QA and community QA data drives our proposal of novel
augmentations of an attention method tailored for CQA, and use of auxiliary
tasks for learning better grounding features. Our final model markedly
outperforms the text-only and VQA model baselines for both tasks of
classification and expert retrieval on real-world multimodal CQA data.Comment: Submitted for review at CIKM 201
Long-Term Image Boundary Prediction
Boundary estimation in images and videos has been a very active topic of
research, and organizing visual information into boundaries and segments is
believed to be a corner stone of visual perception. While prior work has
focused on estimating boundaries for observed frames, our work aims at
predicting boundaries of future unobserved frames. This requires our model to
learn about the fate of boundaries and corresponding motion patterns --
including a notion of "intuitive physics". We experiment on natural video
sequences along with synthetic sequences with deterministic physics-based and
agent-based motions. While not being our primary goal, we also show that fusion
of RGB and boundary prediction leads to improved RGB predictions.Comment: Accepted in the AAAI Conference for Artificial Intelligence, 201
- …